# Load necessary librarieslibrary(tidyverse)library(patchwork)library(cowplot)library(car)library(lme4)library(lmerTest)library(emmeans)library(knitr)library(viridis)# Set theme for consistent plottingtheme_set(theme_cowplot())# Custom theme for consistent plottingmy_theme <-theme_minimal() +theme(legend.position ="none",plot.title =element_text(face ="bold", size =10),axis.title =element_text(size =9),axis.text =element_text(size =8) )
Introduction
This document provides a comparison between one-way ANOVA and nested ANOVA using the andrew dataset from Quinn & Keough (2002). This tutorial aims to help students understand how the partitioning of variance differs between these two approaches and why accounting for nested factors is crucial in certain experimental designs.
In the andrew dataset, the experimental design consists of:
Four urchin density treatments (TREAT): Control, 66% Density, 33% Density, and Removed
Each treatment was replicated within four random patches (PATCH)
Five replicate quadrats were measured within each treatment-patch combination
The response variable is percentage cover of filamentous algae
Dataset Overview
First, let’s load and explore the dataset:
# Read in the andrew datasetandrew <-read_csv("data/andrew.csv")# Convert TREAT to factor with meaningful labelsandrew$TREAT <-factor(andrew$TREAT, levels =c("con", "t0.66", "t0.33", "rem"),labels =c("Control", "66% Density", "33% Density", "Removed"))# Convert PATCH to factorandrew$PATCH <-factor(andrew$PATCH)# Display the first few rows of the datasethead(andrew)
# A tibble: 6 × 4
TREAT PATCH QUAD ALGAE
<fct> <fct> <dbl> <dbl>
1 Control 1 1 0
2 Control 1 2 0
3 Control 1 3 0
4 Control 1 4 6
5 Control 1 5 2
6 Control 2 1 0
Important Note: The ANOVA table above does not use the correct error terms for testing the treatment effect. In a nested design, the treatment effect should be tested against the patch variation, not the residual error.
Corrected Nested ANOVA with Proper F-tests
Let’s calculate the correct F-ratios and p-values for the nested design:
Source df SS MS F p
1 Treatment 3 14429.14 4809.712 2.717102 9.13e-02
2 Patches (within treatment) 12 21241.95 1770.162 5.928207 <0.001
3 Residual 64 19110.40 298.600 NA <NA>
With the corrected nested ANOVA, we find:
The treatment effect is not significant (F = 2.72, p = 0.0913) when tested against the patch variation.
There is significant variation among patches within treatments (F = 5.93, p < 0.001)
This is a different conclusion than the one-way ANOVA, which found a significant treatment effect.
Variance Decomposition Comparison
Visual Decomposition of Variance Components
First, let’s create a visual representation of how variance is partitioned in a standard one-way ANOVA, and then contrast it with how a nested ANOVA further divides the variance components.
[1] "Treatment means:"
# A tibble: 4 × 2
TREAT treat_mean
<fct> <dbl>
1 Control 1.3
2 66% Density 21.6
3 33% Density 19
4 Removed 39.2
[1] "First few rows of joined patch_means:"
# A tibble: 6 × 4
TREAT PATCH patch_mean treat_mean
<fct> <fct> <dbl> <dbl>
1 Control 1 1.6 1.3
2 Control 2 0 1.3
3 Control 3 1 1.3
4 Control 4 2.6 1.3
5 66% Density 5 28.4 21.6
6 66% Density 6 36.8 21.6
[1] "treat_mean column is present in patch_means"
The plots above visually demonstrate the key differences in how variance is partitioned between one-way and nested ANOVA:
One-way ANOVA (first plot):
Total variance is split into just two components: Among Groups (Treatment) and Within Groups (Error)
The Within Groups component includes all variation not explained by treatments
Nested ANOVA (second plot):
Total variance is split into three components: Among Treatments, Among Patches within Treatments, and Within Patches (Residual Error)
The important addition is the “Among Patches within Treatments” component, which captures the spatial heterogeneity
The actual residual error (Within Patches) is smaller than the Within Groups error in one-way ANOVA
This visualization demonstrates why we get different conclusions: in one-way ANOVA, the patch-to-patch variation is incorrectly included in the error term, leading to an artificially inflated F-ratio for treatments.
Numerical Decomposition of Variance
# Calculate sums of squares for both modelsSS_Total <-sum(nested_summary[[1]][, "Sum Sq"])SS_Treatment <- nested_summary[[1]]["TREAT ", "Sum Sq"]SS_Patch_within_Treatment <- nested_summary[[1]]["TREAT:PATCH", "Sum Sq"]SS_Error_Nested <- nested_summary[[1]]["Residuals", "Sum Sq"]SS_Error_OneWay <- oneway_summary[[1]]["Residuals", "Sum Sq"]# Calculate percentages for visualizationpercent_treatment_oneway <- (SS_Treatment / SS_Total) *100percent_error_oneway <- (SS_Error_OneWay / SS_Total) *100percent_treatment_nested <- (SS_Treatment / SS_Total) *100percent_patch_nested <- (SS_Patch_within_Treatment / SS_Total) *100percent_error_nested <- (SS_Error_Nested / SS_Total) *100# Create data frame for visualizationss_comparison <-data.frame(Model =c(rep("One-way ANOVA", 2), rep("Nested ANOVA", 3) ),Source =c("Treatment", "Error (within)","Treatment", "Patch(Treatment)", "Error" ),Percent =c( percent_treatment_oneway, percent_error_oneway, percent_treatment_nested, percent_patch_nested, percent_error_nested ),SS =c( SS_Treatment, SS_Error_OneWay, SS_Treatment, SS_Patch_within_Treatment, SS_Error_Nested ))# Add factor levels for orderingss_comparison$Source <-factor( ss_comparison$Source,levels =c("Treatment", "Patch(Treatment)", "Error", "Error (within)"))# Create colors for the sourcessource_colors <-c("Treatment"="#1b9e77", "Patch(Treatment)"="#d95f02", "Error"="#7570b3","Error (within)"="#7570b3")# Create the stacked bar plotp1 <-ggplot(ss_comparison, aes(x = Model, y = Percent, fill = Source)) +geom_bar(stat ="identity", position ="stack", color ="black") +scale_fill_manual(values = source_colors) +labs(title ="Comparison of Variance Partitioning",subtitle ="One-way vs. Nested ANOVA",x ="",y ="Percentage of Total Variance",fill ="Source of Variation" ) +geom_text(aes(label =sprintf("%.1f%%", Percent)),position =position_stack(vjust =0.5),color ="white",fontface ="bold" ) +theme(plot.title =element_text(face ="bold", size =14),legend.position ="right",axis.text.x =element_text(face ="bold", size =12) )# Create a pie chart version for one-way ANOVAoneway_data <- ss_comparison %>%filter(Model =="One-way ANOVA")oneway_data$ypos <-cumsum(oneway_data$Percent) -0.5* oneway_data$Percentp2 <-ggplot(oneway_data, aes(x ="", y = Percent, fill = Source)) +geom_bar(stat ="identity", width =1, color ="black") +coord_polar("y", start =0) +labs(title ="One-way ANOVA",subtitle ="Variance Components",fill ="Source of Variation" ) +scale_fill_manual(values = source_colors) +geom_text(aes(y = ypos, label =sprintf("%.1f%%", Percent)),color ="white",fontface ="bold" ) +theme_void() +theme(plot.title =element_text(face ="bold", size =14, hjust =0.5),plot.subtitle =element_text(hjust =0.5),legend.position ="bottom" )# Create a pie chart version for nested ANOVAnested_data <- ss_comparison %>%filter(Model =="Nested ANOVA")nested_data$ypos <-cumsum(nested_data$Percent) -0.5* nested_data$Percentp3 <-ggplot(nested_data, aes(x ="", y = Percent, fill = Source)) +geom_bar(stat ="identity", width =1, color ="black") +coord_polar("y", start =0) +labs(title ="Nested ANOVA",subtitle ="Variance Components",fill ="Source of Variation" ) +scale_fill_manual(values = source_colors) +geom_text(aes(y = ypos, label =sprintf("%.1f%%", Percent)),color ="white",fontface ="bold" ) +theme_void() +theme(plot.title =element_text(face ="bold", size =14, hjust =0.5),plot.subtitle =element_text(hjust =0.5),legend.position ="bottom" )# Display all plotsp1
# Combine the pie chartsp2 + p3 +plot_layout(ncol =2)
Mixed-Effects Model Approach
A modern way to analyze nested designs is to use mixed-effects models. Let’s compare the results with our previous analyses:
# Fit linear mixed model with PATCH nested in TREATmixed_model <-lmer(ALGAE ~ TREAT + (1|TREAT:PATCH), data = andrew)# Show model summarysummary(mixed_model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: ALGAE ~ TREAT + (1 | TREAT:PATCH)
Data: andrew
REML criterion at convergence: 682.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.9808 -0.3106 -0.1093 0.2831 2.5910
Random effects:
Groups Name Variance Std.Dev.
TREAT:PATCH (Intercept) 294.3 17.16
Residual 298.6 17.28
Number of obs: 80, groups: TREAT:PATCH, 16
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1.300 9.408 12.000 0.138 0.8924
TREAT66% Density 20.250 13.305 12.000 1.522 0.1539
TREAT33% Density 17.700 13.305 12.000 1.330 0.2081
TREATRemoved 37.900 13.305 12.000 2.849 0.0147 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) TREAT6D TREAT3D
TREAT66%Dns -0.707
TREAT33%Dns -0.707 0.500
TREATRemovd -0.707 0.500 0.500
# ANOVA-style resultsanova(mixed_model, type =3, ddf ="Satterthwaite")
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
TREAT 2434 811.33 3 12 2.7171 0.09126 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The mixed model approach gives us similar conclusions to the correctly specified nested ANOVA. The treatment effect is non-significant when accounting for the nested structure of patches within treatments.
Visualizing the Nested Structure
One way to understand why we get different results is to visualize the data by patch within treatment:
# Calculate means for each patchpatch_means <- andrew %>%group_by(TREAT, PATCH) %>%summarize(ALGAE_mean =mean(ALGAE), .groups ="drop")# Plot patch meansggplot() +# Plot patch meansgeom_point(data = patch_means, aes(x = TREAT, y = ALGAE_mean, color = PATCH),position =position_jitterdodge(jitter.width =0.1, dodge.width =0.7),size =3) +# Add treatment meansgeom_point(data = summary_stats, aes(x = TREAT, y = mean),size =5, shape =18, color ="black") +# Add error bars for treatment meansgeom_errorbar(data = summary_stats,aes(x = TREAT, ymin = mean - se, ymax = mean + se),width =0.2, color ="black", linewidth =1) +# Plot stylingscale_color_viridis_d(option ="plasma", end =0.9) +labs(title ="Nested Structure: Patch Means within Treatments",subtitle ="Large symbols show treatment means (± SE)",x ="Urchin Density Treatment",y ="Mean Filamentous Algae Cover (%)",color ="Patch" ) +theme(legend.position ="right",plot.title =element_text(face ="bold") )
This plot clearly shows the high variation among patches within each treatment. This patch-to-patch variation contributes substantially to the total variance, which is not accounted for in the one-way ANOVA.
F-ratio Demonstration
Let’s illustrate how the F-ratio for treatments differs between one-way and nested ANOVA:
# Create a data frame for comparisonf_ratio_comparison <-data.frame(Model =c("One-way ANOVA", "Nested ANOVA"),F_numerator =c("MS Treatment", "MS Treatment"),F_denominator =c("MS Residual", "MS Patch(Treatment)"),F_value =c(oneway_summary[[1]][1, "F value"], F_treat),p_value =c(oneway_summary[[1]][1, "Pr(>F)"], p_treat))# Display the comparisonf_ratio_comparison
Model F_numerator F_denominator F_value p_value
1 One-way ANOVA MS Treatment MS Residual 9.058658 3.362391e-05
2 Nested ANOVA MS Treatment MS Patch(Treatment) 2.717102 9.126200e-02
Key Differences and Implications
The comparison between one-way ANOVA and nested ANOVA reveals several important differences:
Variance Partitioning:
In one-way ANOVA, all variation not explained by treatments is lumped into the “Error” term.
In nested ANOVA, this variation is partitioned into “Patch(Treatment)” and “Error” components.
F-ratio for Testing Treatment Effects:
One-way ANOVA tests treatments against residual error (MS Treatment / MS Residual).
Nested ANOVA tests treatments against patch variation (MS Treatment / MS Patch(Treatment)).
Nested ANOVA reveals that most variation is among patches, with non-significant treatment effects (p = 0.0913).
The substantial patch variability (38.8% of total variation) masks the treatment effect when properly accounted for.
Statistical Power and Type I Error:
Ignoring the nested structure leads to pseudoreplication and inflated Type I error rates.
The one-way ANOVA effectively treats each quadrat as an independent sample, inflating the degrees of freedom for the error term.
Conclusion
This demonstration illustrates why accounting for hierarchical or nested structures in experimental designs is crucial for valid statistical inference. Failure to account for such structures can lead to:
Pseudoreplication (treating non-independent samples as independent)
Inflation of Type I error rates (finding spurious “significant” effects)
Incorrect partitioning of variance sources
Misleading biological interpretations
In the andrew dataset, the nested ANOVA reveals that spatial heterogeneity (patch-to-patch variation) is the dominant factor influencing algae cover, not the experimental treatment. This insight would be missed if only a one-way ANOVA were used.
Alternative Code for Mixed Models
For advanced users, we could also fit this as a mixed model with lmerTest:
library(lmerTest)# Mixed model with patches nested within treatmentsmixed_model_alt <-lmer(ALGAE ~ TREAT + (1|TREAT:PATCH), data = andrew)anova(mixed_model_alt, type =3, ddf ="Satterthwaite")
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
TREAT 2434 811.33 3 12 2.7171 0.09126 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# We could also try a simpler random effects structuresimple_mixed_model <-lmer(ALGAE ~ TREAT + (1|PATCH), data = andrew)anova(simple_mixed_model, type =3, ddf ="Satterthwaite")
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
TREAT 2434 811.33 3 12 2.7171 0.09126 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Mixed models provide a more flexible approach to handle nested designs and are the recommended approach in modern statistical practice.